AITopics | multimodal ai system

Collaborating Authors

multimodal ai system

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards deployment-centric multimodal AI beyond vision and language

Liu, Xianyuan, Zhang, Jiayang, Zhou, Shuo, van der Plas, Thijs L., Vijayaraghavan, Avish, Grishina, Anastasiia, Zhuang, Mengdie, Schofield, Daniel, Tomlinson, Christopher, Wang, Yuhan, Li, Ruizhe, van Zeeland, Louisa, Tabakhi, Sina, Demeocq, Cyndie, Li, Xiang, Das, Arunav, Timmerman, Orlando, Baldwin-McDonald, Thomas, Wu, Jinge, Bai, Peizhen, Sahili, Zahraa Al, Alwazzan, Omnia, Do, Thao N., Suvon, Mohammod N. I., Wang, Angeline, Cipolina-Kun, Lucia, Moretti, Luigi A., Farndale, Lucas, Jain, Nitisha, Efremova, Natalia, Ge, Yan, Varela, Marta, Lam, Hak-Keung, Celiktutan, Oya, Evans, Ben R., Coca-Castro, Alejandro, Wu, Honghan, Abdallah, Zahraa S., Chen, Chen, Danchev, Valentin, Tkachenko, Nataliya, Lu, Lei, Zhu, Tingting, Slabaugh, Gregory G., Moore, Roger K., Cheung, William K., Charlton, Peter H., Lu, Haiping

arXiv.org Artificial IntelligenceSep-22-2025

Multimodal artificial intelligence (AI) integrates diverse types of data via machine learning to improve understanding, prediction, and decision-making across disciplines such as healthcare, science, and engineering. However, most multimodal AI advances focus on models for vision and language data, while their deployability remains a key challenge. We advocate a deployment-centric workflow that incorporates deployment constraints early to reduce the likelihood of undeployable solutions, complementing data-centric and model-centric approaches. We also emphasise deeper integration across multiple levels of multimodality and multidisciplinary collaboration to significantly broaden the research scope beyond vision and language. To facilitate this approach, we identify common multimodal-AI-specific challenges shared across disciplines and examine three real-world use cases: pandemic response, self-driving car design, and climate change adaptation, drawing expertise from healthcare, social science, engineering, science, sustainability, and finance. By fostering multidisciplinary dialogue and open research practices, our community can accelerate deployment-centric development for broad societal impact.

data mining, machine learning, real time system, (19 more...)

arXiv.org Artificial Intelligence

2504.03603

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre:

Research Report > Experimental Study (1.00)
Workflow (0.89)
Research Report > Strength High (0.68)

Industry:

Transportation (1.00)
Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)
(8 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(6 more...)

Add feedback

Translating Multimodal AI into Real-World Inspection: TEMAI Evaluation Framework and Pathways for Implementation

Li, Zehan, Deng, Jinzhi, Ma, Haibing, Zhang, Chi, Xiao, Dan

arXiv.org Artificial IntelligenceApr-22-2025

Translating Multimodal AI into Real-World Inspection: TEMAI Evaluation Framework and Pathways for Implementation Zehan LI 1,3, Jinzhi Deng 1,2, Haibing Ma 1,2, Chi Zhang 1, and Dan Xiao 1 1 Moximize.ai 2 Shanghai Zhongqiao Vocational And Technical University 3 China Creative Studies Institute April 22, 2025 Abstract This paper introduces the Translational Evaluation of Multimodal AI for Inspection (TEMAI) framework, bridging multimodal AI capabilities with industrial inspection implementation. Adapting translational research principles from healthcare to industrial contexts, TEMAI establishes three core dimensions: Capability (technical feasibility), Adoption (organizational readiness), and Utility (value realization). The framework demonstrates that technical capability alone yields limited value without corresponding adoption mechanisms. TEMAI incorporates specialized metrics including the Value Density Coefficient and structured implementation pathways. Empirical validation through retail and photovoltaic inspection implementations revealed significant differences in value realization patterns despite similar capability reduction rates, confirming the framework's effectiveness across diverse industrial sectors while highlighting the importance of industry-specific adaptation strategies. Keywords: Multimodal AI, Industrial Inspection, Translational Framework, TEMAI 1 Introduction Industrial inspection tasks are fundamental to ensuring operational continuity and safety in manufacturing sectors, serving as a cornerstone for preventive maintenance and risk mitigation. These tasks, however, are plagued by systemic inefficiencies, including labor-intensive workflows, hazardous working environments (e.g., high-temperature zones or toxic gas exposure), and heavy reliance on empirical knowledge that is difficult to standardize or transfer across industries[1]. Despite incremental advancements in automation technologies--such as drones, AR-assisted devices, and IoT-enabled sensors--the integration of these tools into inspection workflows has yielded limited returns due to fragmented deployment, high implementation costs, and insufficient interoperability between hardware and software systems [2]. For instance, while drones have reduced human exposure to dangerous environments in power grid inspections, their operational scope remains constrained by battery life and data processing bottlenecks[3].

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2504.13873

Country: Asia > China > Shanghai > Shanghai (0.24)

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Energy > Renewable > Solar (0.88)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.46)

Add feedback

ACE, Action and Control via Explanations: A Proposal for LLMs to Provide Human-Centered Explainability for Multimodal AI Assistants

Watkins, Elizabeth Anne, Moss, Emanuel, Manuvinakurike, Ramesh, Shi, Meng, Beckwith, Richard, Raffa, Giuseppe

arXiv.org Artificial IntelligenceFeb-27-2025

In this short paper we address issues related to building multimodal AI systems for human performance support in manufacturing domains. We make two contributions: we first identify challenges of participatory design and training of such systems, and secondly, to address such challenges, we propose the ACE paradigm: "Action and Control via Explanations". Specifically, we suggest that LLMs can be used to produce explanations in the form of human interpretable "semantic frames", which in turn enable end users to provide data the AI system needs to align its multimodal models and representations, including computer vision, automatic speech recognition, and document inputs. ACE, by using LLMs to "explain" using semantic frames, will help the human and the AI system to collaborate, together building a more accurate model of humans activities and behaviors, and ultimately more accurate predictive outputs for better task support, and better outcomes for human users performing manual tasks.

explanation, marie, semantic frame, (13 more...)

arXiv.org Artificial Intelligence

2503.16466

Country:

North America > United States > California > Santa Clara County > Santa Clara (0.05)
North America > United States > Oregon > Washington County > Hillsboro (0.05)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.84)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.75)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.51)

Add feedback

Global Big Data Conference

#artificialintelligenceMar-23-2022, 18:52:02 GMT

Earlier this month, researchers at the Allen Institute for AI -- a nonprofit founded by late Microsoft cofounder Paul Allen -- released an interactive demo of a system they describe as part of a "new generation" of AI applications that can analyze, search across, and respond to questions about videos "at scale." Called Merlot Reserve, the researchers had the system "watch" 20 million YouTube videos to learn the relationships between images, sounds, and subtitles, allowing it to, for example, answer questions such as "What meal does the person in the video want to eat?" or "Has the boy in this video swam in the ocean before?" Systems that can process and relate information from audio, visuals and text have been around for years. These technologies continue to improve in their ability to understand the world more like humans. San Francisco research lab OpenAI's DALL-E, which was released in 2021, can generate images of objects -- real or imagined -- from simple text descriptions like "an armchair in the shape of an avocado."

global big data conference, multimodal ai system, video, (7 more...)

#artificialintelligence

Country: North America > United States > California > San Francisco County > San Francisco (0.26)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.57)

Add feedback